-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating HL compile and export to infer APIs #214
base: main
Are you sure you want to change the base?
Conversation
Please rebase |
Change-Id: If27fbc1636ed1fe9b475d07cef7c83ed7dc46ca8 Signed-off-by: Asmita Goswami <[email protected]>
acf8ca3
to
12da558
Compare
QEfficient/cloud/export.py
Outdated
) # type: ignore | ||
logger.info(f"Generated onnx_path: {onnx_model_path}, onnx_dir_path: {onnx_dir_path}") | ||
logger.info(f"Exporting Pytorch {model_name} model to ONNX...") | ||
qeff_model = QEFFAutoModelForCausalLM.from_pretrained(model_name, cache_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not restrict CLI apis to use only AutoModelForCausalLM. It should be generic as we support new auto classes.
Change-Id: If27fbc1636ed1fe9b475d07cef7c83ed7dc46ca8 Signed-off-by: Asmita Goswami <[email protected]>
…transformers into hl_compile_api_infer
Signed-off-by: Asmita Goswami <[email protected]>
hf_token: Optional[str] = None, | ||
local_model_dir: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing this?
@@ -92,7 +76,6 @@ def main( | |||
model_name=model_name, | |||
cache_dir=cache_dir, | |||
hf_token=hf_token, | |||
local_model_dir=local_model_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
QEfficient/cloud/infer.py
Outdated
config = AutoConfig.from_pretrained(model_name) | ||
architecture = config.architectures[0] if config.architectures else None | ||
|
||
model_class = architecture_mapping.get(architecture) | ||
if not model_class: | ||
logger.error(f"Model class for model name {model_name} not found in mapping") | ||
return | ||
|
||
qeff_model = model_class.from_pretrained(model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? directly useQEFFAutoModelForCausalLM.from_pretrained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of writing own dictionary please make use of
MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES
, MODEL_FOR_CAUSAL_LM_MAPPING_NAMES
from transformers.
from transformers.models.auto.modeling_auto import MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES, MODEL_FOR_CAUSAL_LM_MAPPING_NAMES
## Now check if the architecture is present in either of the values of these two dictionaries and call our corresponding auto class based on that.
QEfficient/cloud/infer.py
Outdated
# Map model's architecture to class | ||
architecture_mapping = { | ||
"LlamaForCausalLM": QEFFAutoModelForCausalLM, | ||
"GPT2LMHeadModel": QEFFAutoModelForCausalLM, | ||
"MistralForCausalLM": QEFFAutoModelForCausalLM, | ||
"FalconForCausalLM": QEFFAutoModelForCausalLM, | ||
"GPTJForCausalLM": QEFFAutoModelForCausalLM, | ||
"GemmaForCausalLM": QEFFAutoModelForCausalLM, | ||
"Gemma2ForCausalLM": QEFFAutoModelForCausalLM, | ||
"Phi3ForCausalLM": QEFFAutoModelForCausalLM, | ||
"Qwen2ForCausalLM": QEFFAutoModelForCausalLM, | ||
"GPTBigCodeForCausalLM": QEFFAutoModelForCausalLM, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
Signed-off-by: Asmita Goswami <[email protected]>
QEfficient/cloud/export.py
Outdated
|
||
from QEfficient.exporter.export_hf_to_cloud_ai_100 import qualcomm_efficient_converter | ||
from QEfficient.utils import check_and_assign_cache_dir, onnx_exists | ||
from QEfficient.transformers.models.modeling_auto import QEFFAutoModelForCausalLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from QEfficient import QEFFAutoModelForCausalLM
it's present in __init__
QEfficient/cloud/export.py
Outdated
full_batch_size=full_batch_size, | ||
) # type: ignore | ||
logger.info(f"Generated onnx_path: {onnx_model_path}, onnx_dir_path: {onnx_dir_path}") | ||
logger.error(f"Model class for model name {model_name} not found in mapping") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`raise NotImplementedError(f"Unknown architecture={architecture}, either use specific auto model class for loading the model or raise an issue for support!")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fail here which will force the script to exit
QEfficient/cloud/export.py
Outdated
) # type: ignore | ||
logger.info(f"Generated onnx_path: {onnx_model_path}, onnx_dir_path: {onnx_dir_path}") | ||
logger.error(f"Model class for model name {model_name} not found in mapping") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
QEfficient/cloud/infer.py
Outdated
enable_qnn=enable_qnn, | ||
qnn_config=qnn_config, | ||
) | ||
logger.error(f"Model class for model name {model_name} not found in mapping") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same raise error
QEfficient/cloud/infer.py
Outdated
config = AutoConfig.from_pretrained(model_name) | ||
architecture = config.architectures[0] if config.architectures else None | ||
|
||
if architecture in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES.values(): | ||
model_class = QEFFAutoModelForCausalLM | ||
else: | ||
# Handle onnx model generation | ||
onnx_model_path = get_onnx_model_path( | ||
model_name, cache_dir, tokenizer, hf_token, local_model_dir, full_batch_size | ||
) # , base_dir_name) | ||
|
||
######### | ||
# Compile | ||
######### | ||
_ = QEfficient.compile( | ||
onnx_path=onnx_model_path, | ||
qpc_path=os.path.dirname( | ||
qpc_dir_path | ||
), # We need to pass parent directory of qpc_dir_path, as the compile function handles the qpcs directory creation | ||
num_cores=num_cores, | ||
batch_size=batch_size, | ||
prompt_len=prompt_len, | ||
ctx_len=ctx_len, | ||
mxfp6=mxfp6, | ||
mxint8=mxint8, | ||
aic_enable_depth_first=aic_enable_depth_first, | ||
mos=mos, | ||
device_group=device_group, | ||
full_batch_size=full_batch_size, | ||
allow_mxint8_mdp_io=allow_mxint8_mdp_io, | ||
enable_qnn=enable_qnn, | ||
qnn_config=qnn_config, | ||
) | ||
logger.error(f"Model class for model name {model_name} not found in mapping") | ||
return | ||
|
||
qeff_model = model_class.from_pretrained( | ||
pretrained_model_name_or_path=(local_model_dir if local_model_dir else model_name), | ||
cache_dir=cache_dir, | ||
hf_token=hf_token, | ||
full_batch_size=full_batch_size, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this code is a copy of the same in export method you can create a common method in utils file in cloud folder and use from there. you can call it.
load_qeff_model
Signed-off-by: Asmita Goswami <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Migrating HL compile API and export API to infer APIs